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Abstract. This paper proposes and analyzes fully data driven methods for infer- 
ence about the mean function of a stochastic process from a sample of independent 
trajectories of the process, observed at discrete time points and corrupted by additive 
random error. The proposed method uses thresholded least squares estimators rela- 
tive to an approximating function basis. The variable threshold levels are estimated 
from the data and the basis is chosen via cross-validation from a library of bases. The 
resulting estimates adapt to the unknown sparsity of the mean function relative to 
the selected approximating basis, both in terms of the mean squared error and supre- 
mum norm. These results are based on novel oracle inequalities. In addition, uniform 
confidence bands for the mean function of the process are constructed. The bands 
also adapt to the unknown regularity of the mean function, are easy to compute, and 
do not require explicit estimation of the covariance operator of the process. The sim- 
ulation study that complements the theoretical results shows that the new method 
performs very well in practice, and is robust against large variations introduced by 
the random error terms. 

Keywords: Stochastic processes; nonparametric mean estimation; thresholded esti- 
mators; functional data; oracle inequalities; adaptive inference; uniform confidence 
bands. 

1. Introduction 

In this paper we develop and analyze new methodology for inference about the mean 
of a stochastic process from data that consists of independent reahzations of a stochas- 
tic process observed at discrete times, where each observation is contaminated by an 
additive error term. Formally, let {X{t), < t < 1} be a stochastic process with mean 
function 

fit) = E[X{t)] 
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and covariance function 

for all < s, t < 1. We denote the zero mean process X{t) — f{t) by Z{t). We observe 
Yij at times tj, for 1 < i < n, I < j < m, that are of the form 

(1) Yij = Xi(tj) + Eij 

where Xi{t), with mean f{t), are random independent realizations of the process X{t). 
We assume that Eij are independent across i and j with zero mean and variance 

E[4] = 

In this paper we propose new methods for estimating and constructing confidence 
bands for /. Although the estimation of / received considerable attention over the last 
decade, the theoretical study of data adaptive estimators in model ([T]) is still open to 
investigation. In contrast with the abundance of methods for estimating /, methods 
for constructing confidence bands for / are very limited. This motivates our twofold 
contribution to the existing literature: (1) We construct computationally efficient and 
fully data-driven estimators and confidence bands for /, without making distributional 
assumptions on the process Z(t) or smoothness assumptions on /; (2) We assess the 
quality of our data adaptive estimates theoretically and prove that both the estimators 
and the confidence bands adapt to the unknown regularity of /. Moreover, we show 
that our bands are, asymptotically in n, uniform in /. 

In what follows we review the existing results in the literature and provide further 
motivation for our procedure. The problem of estimating / from data generated from 
([T]) has been considered by a large number of authors, starting with Ramsay and Sil- 
verman (2002, 2005) and Rupert, Wand and Carroll (2003). The existing methods are 
either based on kernel smoothers [see, e.g., Zhang and Chen (2007), Yao (2007), Benko, 
Hardle and Kneip (2009)], penalized splines [see, e.g., Ramsay and Silverman (2005)], 
free-knot splines [see, e.g., Gervini (2006)], or ridge-type least squares estimates, [see, 
e.g.. Rice and Silverman (1991)]. All resulting estimates depend on tuning parameters 
that are method specific. Theoretical properties of these estimates of / are still emerg- 
ing, and have only been established for non-adaptive choices of the respective tuning 
parameters, that is choices that require prior knowledge of the smoothness of /, [see, 
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e.g. Zhang and Chen (2007) and Gervini (2006)]. Although guidehnes for data-driven 
choices of these parameters are offered in all these works, the theoretical properties 
of the resulting estimates are still open to investigation. In contrast, we suggest in 
Section 2 below a computationally simple method based on thresholded least squares 
estimators. Our method does not require any specification of the regularity of /(t) or 
X{t) prior to estimation. We show via oracle inequalities that our estimators adapt to 
this unknown regularity. 

Whereas the estimation of the mean f{t) of the process X{t) is well understood, mod- 
ulo the technical and possibly computational issues raised above, the construction of 
uniform confidence intervals for / has not been investigated in this context and in gen- 
eral the construction of confidence bands for / in model ([T]) seems to have received little 
attention. Zhang and Chen (2007) and Yao (2007) construct kernel-type estimators 
and show that they are asymptotically normal with mean f{t) and variance r{t,t)/n. 
Although not addressed implicitly in these works, one can use these results to build 
confidence bands. This construction would require the estimation of r{t,t), using for 
instance the large body of work on the estimation of the covariance operator, based 
on the Karhunen-Loeve decomposition of the process X{t) and the subsequent estima- 
tion the the functional eigenfunctions and eigenvalues, see, e.g., Miiller (2005), Benko, 
Hardle and Kneip (2009), who also comment on the possible instability of these es- 
timates and offer refined methods for improved performance. We offer an alternative 
method in Section 2.5 below. Our procedure is computationally simple, avoids direct 
estimation of the covariance matrix T{tj, t/^), 1 < j, k < m, and leads to adaptive bands 
that are uniform in the parameter /. 

The rest of the paper is organized as follows. In Section 2.2 below we discuss thresh- 
olded least squares estimators in the functional data setting. Our emphasis is on hard 
threshold estimators, but we also discuss briefly the closely related soft threshold es- 
timators. In Section 2.3 we establish oracle inequalities for the flt of the estimators 
which show that the estimates adapt to the unknown sparsity of the mean /. The 
sparsity of / is relative to a given approximating basis. In Section 2.4 we suggest 
cross-validation for choosing the basis from a library of bases. Since each basis induces 
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an estimator, our procedure can be regarded as one of selecting an estimator from a 
given list. We then establish an oracle inequahty for the selected estimator, that shows 
that the selected estimator performs, essentially, as well as the best estimator from the 
list, in terms of mean squared error relative to the unknown /. In Section 2.5 we give 
the construction of the confidence bands and prove that they have the desired coverage 
probability. Section 3 contains a comprehensive simulation study that strongly sup- 
ports the theoretical merits of the method and indicates that our method compares 
favorably with existing methods. The net merit of the proposed method is very visible 
when the variance of the random noise e is at the same level as that of the stochastic 
process Z{t) and we discuss this in detail in Sections 3.2 and 3.3. All the proofs are 
collected in the Appendix. 



2.1. Preliminaries. As explained in the introduction, the aim of this paper is (a) to 
estimate the mean f{t) of the process X{t) and (b) to construct confidence bands for 
the mean f{t). Our approach is based on thresholded least squares estimates obtained 
relative to bases 0i, . . . , 0^ that are orthonormal in L^(Pm), where is the empirical 
measure that puts mass 1/m at each tj. Thus, our bases satisfy 



for 1 < k, k' < m. Examples include the Fourier, local trigonometric and Haar bases. 

Since (pi, ... , 0^ is orthonormal in L^(Pm), each Xi{tj) = f{tj) + Ziitj) has the decom- 
position 



2. Methodology 
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with 



(3) 
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and 

(4) Aik = - ^Z,{t^)(t)k{tj). 

For ease of notation we suppress the dependence on m in and Ai^. Under model 
([T| the random variables Aik-, . . . , Ank-, for each A;, are independent and identically 
distributed with mean zero and variance 

^ m m 

(5) al ^ E[A2,] = — J]J]r(t„V)0fc(i.)0fc(V); 

j=l j'=l 

in the special case where T{s,t) = r^l{s = t} for all s,t, the variances al reduce 
to o"! = r'^/m for = l,...,m. The coefficients fik determine the target vector 
(/(ti), . . . , f(tm)y via the formula 

(6) f{tj) = ^fikMtj), 

k=l 

for each j. We motivate below our proposed methods for inference on (/(ti), . . . , /(tm))'. 



2.2. Threshold-type estimators for functional data. Our procedure falls between 
two of the currently used strategies: averaging estimated individual trajectories and 
applying various smoothing methods to the entire data set. Our initial estimator of 
f{tj) is a least squares estimator, which can be viewed as an average (over n) of weighted 
values of the l^/s. Our final estimator will be a truncated version of the least squares 
estimator, with data dependent truncation levels determined from the entire data set. 
We describe our procedure below. The least squares estimator based on all observations 



oi fi = {ill, 
minimizes 



over yU G M™ 
are given by 

(7) 



. , Hk, ■ ■ ■ fim), for fXk defined by ([s]), is the vector Ji = (fii, ■ ■ ■ , ^m) that 

^ n ^ m f m 

i=i j=i t fe=i J 

Using the orthonormality property of the basis the estimators Jlk of 
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i=l 



the sample average (over n) of Jli^k, which in turn are the least squares estimators of 
= f^k + Ak based on the observations Yij from the i-ih curve only. Using again 
the orthonormality property of the basis used for this fit, the estimators 'jli^k of are 
given by 



^ m 



m 
i=i 



Recalling that each Yij follows model ([T]), and using the representations ([2]) - Q, we 
can further write as 



^ m 

fJ'i,k = /^fc + ^jfc H / ,^ij4'k{tj)- 



m 



Since Eij and Aik have mean zero and are independent across i and j we obtain, for 
every k, that 

= /ife and VaT{Jii^k) = o^l + —. 

Similarly, we find that 

^ n ^ n ^ m 

Jik = /Xfc + - Ajfc + - — EijCpkitj) 

i=l i=l j=l 



with 

2 2 

E[/ifc] = /ifc and Var(/2fc) = — + — . 

n mn 



The initial (unbiased) estimator fisitj) based on the least squares estimates /i^ of the 
mean function f(tj) is simply 

m 

hs{tj) = ^Jik4>k{tj), 1 < j < m, 
fc=i 

and its variance may be unnecessarily inflated by the presence of, possibly many, very 
small estimates /i^. This can be remedied by truncating the coefficients at a level that 
takes into account both the variability of the measurement errors e and the variability of 
the stochastic process Z{t)] this is the essential difference between truncated estimators 
based on data generated as in ([T]) and their counterpart based only on independent 
data in a standard nonparametric regression setting. We will use the truncation level 
ffc given below and we will justify it theoretically and practically in the next sections. 



Let z{a) be the quantile corresponding to a A/'(0, 1) random variable and < a < 1 is 
a given number. Define 

for some small 6 > (that is set to zero in practice) that depends on 

1 " 

Notice that Si a consistent and unbiased estimator of Yai{jlik) = al + a'^/m since the 
^i,k, 1 < i < n are i.i.d. for fixed 1 < k < m. 

Wc will focus on hard threshold estimators of the coefficients /Xfc and function /. They 
are, respectively 

m 

P^kih) -■ JlkHlfikl > h}; /(f) =: ^fik{rk)(l>k- 

k=l 

We will also consider, for completeness, the soft threshold estimators: 

m 

Jikih) =■ sgn(/}:fc)(|/^fe| - h) + ] f{r) =■ ^Jik{rk)<Pk- 

k=l 

The two estimators are closely related as the coefficients jLtjk(ffe) and Jlki^k) differ by at 
most Ffe since 

J^kih) = V-kijk) - sgn(/ifc)(ffc). 

In the next section we discuss the goodness-of-fit of these estimates in terms of the 
mean squared error and error in the supremum norm. 

2.3. Oracle inequalities for the estimators of the mean of a stochastic pro- 
cess. To discuss the quality of the estimates given above relative to the mean /, we 
first investigate their properties relative to a truncated version of / and obtain the 
desired results as a consequence. We motivate the truncation of / below. Consider the 
theoretical truncation level 



where z{a) is the quantile corresponding to a A/'(0, 1) random variable and < a < 1 
is a given number. Notice that this is the population counterpart of the data based 



levels given in (|9]) above. Define, for each 1 < k < m, 

i^k{rk) = /ifcl{|/ifc| > Tfc} 

and we write 

m 
k=l 

for a truncated version of /. When the truncation /(r) retains the main features of /, 
it can be considered as the new target for inference. This is the approach we take in 
the sequel. As an illustrative example, we consider the mean function 

/(t) = 0.75 exp {64(t - 0.25)^} + 1.93exp {-256(t - 0.75)^} , 

shown in black in Figures[l]^a) and[l](c). We project (/(ti), . . . , f{tm))' onto linear sub- 
spaces in generated by the Fourier and Haar basis functions, respectively, evaluated 
at m = 2^ = 256 equally distant points in [0,1]. Many of the projection coefficients 
Ilk = (1/""^) Xljli fi.'tj)(pk{tj) are close to zero for both bases. We consider the trun- 
cation level Tfc given above with a = 0.05, n = 400, cr^ = 0.136 and for cr^ given 
by ^ above corresponding to the Brownian Bridge process with covariance function 
r(s,t) = min(s,t) — st. 

Figures [l](b) and[T]^d) show |AiA:|I{|^fe|>rfc} versus their index, for the Fourier and Haar 
basis respectively. The reconstruction fr(t) = J2T=i f^k^{\f^k\>rk}'Pk(t) is shown in red 
in Figures [T]^a) and[T]^c), and is very close to f{t) in both cases. Notice that only 11 
coefficients are needed for this good reconstruction of / via the Fourier basis, versus 
92 via the Haar basis, as we reconstruct a differentiable function with differentiable 
and non-differentiable basis functions, respectively. Following standard terminology in 
non-parametric estimation, we refer to the fact that / can be reconstructed well via 
a smaller subset of the given collection of the basis functions by saying that / has a 
sparse representation relative to that basis. 

In what follows we show that the thresholded estimates introduced in the section above 

adapt to the sparsity of /. Of course, since / is unknown, so is its sparsity relative to a 

given basis. Nevertheless, we show that our estimators adapt to this unknown sparsity 

in terms of their fit, and refer to these results as oracle inequalities. The type of oracle 

inequalities that we establish below illustrate that the fit of our estimators depends 

8 



(a) Fourier basis approxima- (b) Corresponding nonzero 
tion coefFcients 
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(c) Haar basis approximation (d) Corresponding nonzero 

coefficients 

Figure 1. Sparsity of the mean function / relative to the Haar and 
Fourier bases 



only on the estimation errors induced by the estimates of the non-zero coefficients of 
a sparse representation of / within a given basis. As our example indicates, and as is 
the case in any non-parametric estimation problem, the overall quality of our estimator 
will further depend on the choice of the basis used for estimation. We will therefore 
complement the construction of our estimator with a basis selection step. We begin by 
stating our results for a given basis. 

For both hard and soft threshold estimators we obtain estimation bounds on the fit at 
the observation points tj. We formulate our results in terms of the empirical supremum 



norm || ||m,oo and L2 norm || ||m,2 defined below. For any real function g, let 



|5'||m,oo = max \g{tj)\; \\g\\m,2 

l<j<m 



\ 



^ m 



All theorems and results of this article require that E[|Z(t)p] < 00, for all t, and that 
E[£^] < 00. The next three theorems are proved for a given basis and the desired 
probability a. All estimates are based on the threshold level = ffc(a) given in (|9]), 
for a user specified value of a. The following result establishes oracle inequalities for 
the hard-threshold estimators. Define 

(11) = ^l\A^ + 25r, 



n 



which differs by 25 from defined above in (10), for a quantity 5 that is arbitrarily 



close to zero; this is needed for purely technical reasons, and for all practical purposes 
ffc and Tfc can be considered the same. 

Theorem 1. For all m > 1, < a < I and S > 

m 

\\f{2f)-f{r)\\m,oc < 3 UiaX || 0fe || oo ^fc 1 { | > Tfc } 



k=l 



\\f(2r) — f{r)\\m,2 < 3^ 



\ k=l 



with probability at least 1 — a, as n oo. 



The same conclusion holds for the soft-threshold estimator: 



Theorem 2. Let be as in p7| j. For all m > 1, < a < 1 and 5 > 

m 

< 3 max ||0fc||oo > r^} 



II /(2f) - /(r)|U 
II /(2f) — /(r)|U,2 < 3 



k=\ 



\ k=l 



with probability at least 1 — a, as n oo. 
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Theorems [T] and [2] yield immediately results on the performance of these estimators 
relative to the untruncated /. In particular, the following inequality holds with prob- 
ability at least 1 — a as n — ^ oo: 



m 



(12) 



g- f\\m,2 < \\f(r) - /||m,2 + 3 ^rpd/ifc] > rj 



for both estimators 'g = f{2r) and ^ = /(2f) • This follows directly from the above results 
and the triangle inequality. The first term ||/(^.) — f\\m,2 can be viewed as the approxi- 
mation error or bias term, whereas the second term represents the estimation error or 
standard deviation term. The bias term is unavoidable, and its size depends on the 
basis choice. It suggests the need for an adaptive method, that would select the basis 
that is best suited for the unknown underlying mean function /. We discuss this in 
Section 2.4 below. 

Theorems [T] and [2] are novel type of oracle inequalities for thresholded estimators, as 
they guarantee the "in probability", rather than "on average", performance of the 
estimator, at any probability level of interest < « < 1. These properties hold 
for our estimates, as they are constructed relative to variable threshold levels that 
depend on a. To the best of our knowledge, such results are new in the functional 
data context. They are also new in the general non-parametric settings, where a 
more traditional way to state the oracle properties of the estimators is in terms of the 
expected mean squared error, see, for instance, Donoho and Johnstone (1995, 1998), 
Wasserman (2006), Tsybakov (2009) and the references therein. For completeness, we 
also give an assessment of our estimates in terms of the expected mean squared error 
in Theorem [3] below, which restates Theorem [T] in terms of expected values. To avoid 
technical clutter, we consider the toy estimator f[2r) in lieu of /(2f)- Recall the notation 
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Theorem 3. For all m > 1, < a < 1 and 5 > 0, we have 

/ 2 2 \ 

(13) m2r)-Mi, < ^J2[^ + ^+^'i)nw\>rk} 

k=i ^ ^ 

m 

+ 4^E [{rl + ijlk-lJikf) -/ifcl > Tk}] . 

k=l 

In particular, if Z is a Gaussian process and eij are Gaussian random variables, 



(14) 



™ / 2 2 \ 

nfi2r) - hr)\\i,, < 2$:rf + ^+ 4rn/{i/x.i>M 

fc=l ^ ^ 

4q; / 2 f^fr 
m ^-^ V 'T- 



Theorem |3] shows that the expected mean squared error of our estimator also adapts 
to the unknown sparsity of /, as indicated by the first term in either inequahty (13) or 



(14). The second term in these inequahties is essentially an average of the quantities 
that constitute the first term. This is more evident from the closed form expression 



(14), and shows that this second term is negligible relative to the first one, especially 
for small values of a. 



2.4. Data adaptive basis selection. The results of the previous section make it 
clear that the basis choice influences both the bias and the variance of our estimates; 
the type of basis one uses for the fit can be regarded as the tuning parameter of our 
estimation procedure. We give below a data adaptive procedure of selection and show 
in Theorem |4] below that the estimator based on the selected basis behaves essentially 
as if the best basis for approximating the unknown / was known in advance. 

We select the basis via a cross-validation (data-splitting) technique, by randomly divid- 
ing the n discretized curves {(l^jj, tj), 1 < J < rn} in two equally sized groups. The first 
sample {(l^j, tj), ^ < j < rn, i G /i} is used for constructing various estimates, say g^, 
1 < i < L, based on various bases, choices of a and thresholding methods (hard and 

soft). The second sample (hold-out or validation sample) {{Yij,tj), 1 < j < m, i E I2} 
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is used to select the optimal estimate ^ = that minimizes the empirical risk 

over I = 1, . . . ,L. Here I2 is the index set for the curves that are set aside to evaluate 
the estimators and n2 = I/2I is its cardinality. 

Theorem 4. Assume that r^^p = E\e\P < 00 and 



^ m 

fz,p = -VE|Z,(T,)r<oo 

nm ' ^ 



m 



for some p > 2. The minimizer ^ = satisfies 

1 ^ 2P/2c„ 

^llfl^-/lllo + 

KKL 



(15) E||^-/||^,<2 



min - fWl, + ^ + L— ^ (r,, + < + f^,, + f|/,^ 



/or some constant Cp < 7.35p/ max{l, log(p)}. 



Remark. Theorem |4] requires that the process Z{t) and the random error e have mo- 
ments strictly larger than 2, which is still a very mild assumption. 



The last term in the right hand side of (15) is of order 1/n, making the sum of the 



last two terms of order 1/n. This can be regarded as the price to pay for using a data 
adaptive procedure to select the appropriate basis. The factor 2 multiplying the right 



hand side of ( 15 ) can be reduced to 1 + /5 at the cost of increasing the last two terms on 
the right by a factor proportional to 1/(3, for f3 > arbitrarily close to zero. To avoid 
notational clutter we opted for using the constant 2. Therefore, Theorem |4] shows that 
the basis selection process yields an estimate that is essentially as good as the best 
estimate on the list, in terms of expected squared error. Since which is best cannot be 
known in advance, as / is unknown, the result of Theorem |4] can also be regarded as 
an oracle inequality. 

2.5. Confidence bands. In this section we will construct confidence bands for / that 
are uniform over the parameter space. We begin with the confidence band based on a 
hard threshold estimator given below. Set 

Tk = ^ ' {Sk + 35), 
\/n 
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and notice that it differs by 5 from given in ^ above. This is again needed for 
technical reasons, as in practice S can be set to zero. 

Theorem 5. (1) For all m > 1, < a < 1 and 5 > 

m 

f(r)itj) ± ^{?>rk)\(t)k{tj)\l{\ilk\ > h}, l<j<m 

k=l 

contains {f(2f)(tj), 1 < j < ^} with probability at least 1 — a, as n oo. 

(2) Moreover, if all non-zero coefficients ^k exceed 2fk, the band can be made smaller 
by a factor 3: 

f(r){tj) ±^rk\(pkitj)\l{\'flk\ > ffc}, 1 < J < m 

contains {f{2f)(tj), I < j < m} with probability at least 1 — a, as n ^ oo. 
We obtain similar results for the soft-threshold estimator. 

Theorem 6. For all m > 1, < a < 1 and 5 > 

m 

f{r){tj) ± ^(2Ffc)|0fe(t^)|l{|/2fc| > ffc}, 1 < J < m 

k=l 

contains {f{2f){tj), 1 < J < fn} with probability at least 1 — a, as n ^ oo. 

Lemma [7] in the Appendix below and the remark following it show that the bands 
have asymptotic probability 1 — a uniformly in the parameter / or, equivalently, in 
the parameter (/xi, . . . This rules out the possibility of exhibiting, for each n, a 

"bad" parameter value (/ii, . . . for which the coverage probability is much smaller 
than 1 — a. This would have been the case had we based our construction on the 
limiting distribution of the truncated estimators / or /, when the resulting confidence 
bands cannot be expected to be uniform, as pointed out by, for instance, Genovose and 
Wasserman (2008) and Wasserman (2006). 

Typically, the price to pay for having uniform confidence bands is the width of the 
band, which is necessarily larger than the width of a pointwise band, as a uniform 
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band needs to cover all unfavorable cases. However, if we restrict the space of the 
parameters over which we require uniformity to spaces containing only those fik that 
are above a small threshold, the width of our bands can be made significantly narrower, 
as in (2) oi Theorem [s] above. Theorems |5] - |6] above show that the average width of 
the bands is a multiple (1, 2 or 3), depending on the estimator and type of uniformity, 
of 



j=l \k=l / k=l \ j=l 



\Mti)\ H\f^k\ >rk}. 



Remark. From the expression above it is clear that the bands and their width adapt 
to the unknown sparsity of /, as only the coefficients Jlk above the given threshold ffc 
contribute to the band and they, in turn, estimate the true coefficients above a certain 
threshold, which reflect the sparsity of /. Since sparsity is relative to a given basis, 
in practice the construction of a confidence band is based on the best basis selected 
from a library of bases, as discussed in Section 2.4 above. Finally, in order to obtain 
uniform bands of reasonable width, we constructed confidence bands for the surrogate 
/2r of /; as advocated by Genovese and Wasserman (2008) for bands in standard 
nonparametric regression models based on non-functional data; this surrogate is based 
on the best selected basis and will capture the main features of /, as illustrated in 
Figure 1 of Section 2.2. 



3. Numerical results 

3.1. Simulation design. We conducted our simulations for a combination of types 
of stochastic processes, stationary and non- stationary, and differentiable and non- 
differentiable mean functions. Specifically, we consider two stationary processes, AR(1) 
and ARIMA(1,1), and two non-stationary processes, the Brownian Bridge (BB) and 
the Brownian Motion (BM) on [0,1]. We consider the two mean functions: f{t) = 
Ci exp {— 64(t — 0.25)^} + C2 exp {— 256(t — 0.75)^}, referred to in the sequel as Signal 
1, and f(t) = C3lo.35<t<o.375 + C3lo.75<t<o.875, referred to as Signal 2. The constants 
Ci — C4 will be varied to achieve various desired signal to noise ratios. 



For our simulations we considered two popular families of bases, Fourier and Haar, each 
known to have good approximation properties for functions in L'^{[0, 1]) belonging to 
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general smoothness classes, e.g. Sobolev classes. Both bases share the orthonormality 
property Yl^=i't'kiij)4'k'itj) = ml{k = k'}, for 1 < k,k' < m, see, e.g. Tsybakov 
(2009) for an argument and a more detailed discussion of smoothness classes. Any 
other bases with this property can be considered, and the qualitative and quantitative 
points we illustrate here will remain essentially the same. 

3.1.1. Simulation scenarios. We simulated n curves for each of the eight combinations 
(signal, stochastic process) above. Each curve 1 < i < n is observed at m equally 
spaced points tj G [0, 1] and the observations follow model ([T]), 

for 1 < z < n and 1 < j < m. The measurement errors Eij ~ A^(0, o"^) are i.i.d. across i 

and j. The parameters for simulating the AR(1) and ARIMA(1,1) processes are chosen 

in order to achieve the following equivalences: 

median { Var [ Brownian Bridge (t)] } = Var {AR(1)} 
t 

median { Var [ Brownian Motion (t)] } = Var {ARIMA(1,1)} 

t 

This facilitates comparison between processes of different natures. Next, the variance 
of the measurement error o"^ is chosen so that we have two cases: cr* = 1 and a* = 10, 
where 

16 a* = 

When cr* = 1 the variability of the measurement error is the same as that of the 
stochastic process, whereas for a* = 10 the measurement errors become essentially 
negligible. Figure [2] below shows, respectively, realizations from each of the stochastic 
process with mean corresponding to Signal 1 and added noise corresponding to cr* = 1 
and 10, respectively. 

We conducted simulations for different values of the signal-to- noise ratio (SNR). Since 
the process Z{t) is assumed independent of the measurement error, we define as a 
measure of the noise (Var[Z(t)] + cr^)^^^ and the signal-to-noise ratio to be SNR = 
Range[/]/ (Var[Z(t)] + a^f^ where Range[/] = | max^ f{t) - min^ f{t)\, t E [0, 1]. 
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(a) AR(1), (J* = 10 (b) AR(1), a* = 1 (c) BB, a* = 10 (d) BB, a* = 1 




(e) ARIMA(1,1), a* = 10 (f) ARIMA(1,1), a* = 1 (g) BM, ct* = 10 (h) BM, cr* = 1 

Figure 2. Plots of Signal 1 + AR(1)/BB + Noise (top row) and of 
Signal 1 + ARIMA(1,1)/BM + Noise (bottom row), n = 50, m = 256, 
SNR = 4.25 

3.2. Simulation results: the fit of the estimates. In this subsection we consid- 
ered n — 400 and m — 256; we took significantly lower values of m and n in the 
next subsection. We contrast the quality of the fit of our estimates with the sim- 
plest estimate, the ensemble average of the observations Yij and with estimates ob- 
tained via 7 other methods previously proposed and studied in the literature. The 
first three are obtained by applying, respectively, the following smoothing methods to 
the entire data set, containing all n curves: (1) Linear polynomial kernel smoothing 
(Local Poly) with a global bandwidth, suggested by, among others, Miiller (2005), 
Yao, Miiller and Wang (2005), Miiller, Sen and Stadtmiiller (2006), Yao (2007). We 
use a plug-in bandwidth adapting the method developed by Ruppert, Sheather and 
Wand (1995) to our case. We obtain the estimated bandwidth h^iug using dpill in R, 
and the estimate /locpoiy(^) using locpoly in R; (2) Nadaraya- Watson kernel smooth- 
ing (NWK) with a global bandwidth, discussed in a functional data setting by, e.g., 
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Yao (2007). We used a Gaussian kernel, and a grid of possible bandwidth choices 
G = [0.002, 0.004, 0.008, 0.01, 0.02, 0.04, 0.08, 0.1]. We computed the estimate fNWK{t) 
using each bandwidth in the grid 6 G G. We report the results for the bandwidth b E G 
for which we obtained the smallest EMSE{ f^wK)] (3) Smoothing splines, as suggested 
by, for instance. Rice and Silverman (1991). We used order 4 B-splines basis functions 
Bk{t) with a knot placed at each design time point tj, and the square of the second 
derivative of f{t) as the roughness penalty. The tuning parameter in the penalty term 
is chosen by generalized cross-validation, leaving one curve out at a time. We imple- 
mented the method using smooth, spline in R. 

For the last four of the methods used for comparison we estimate f{t) be averaging 
smoothed versions of the individual trajectories. The reconstructions of the individual 
curves were performed using: (5) Linear polynomial kernel (Global kernel) smoother 
with a global bandwidth; (6) Linear polynomial kernel smoother (Local kernel) with a 
local bandwidth, where the bandwidths are found using the plug-in algorithm proposed 
in Seifert, Brockman, Engel and Gasser (1994); (7) B-splines regression with rough- 
ness penalty; (8) Fourier expansion regression with roughness penalty, as discussed in 
Ramsay and Silverman (2005), Chapter 5. 

We contrast the estimates above with our estimates. We consider hard threshold esti- 
mates (HT) obtained by truncating the least squares estimates either at levels for 
each k, and denote the resulting estimate by HT(r), or at levels 2ffc, to obtain HT(2r), 
for ffc given in ^ above, for 5 = 0. We have also conducted extensive simulation 
experiments for the soft-thresholded estimates ST(r) and ST(2r), and in all cases we 
obtained inferior results to those obtained for the hard-thresholded estimators and for 
space limitations we do not report them in what follows. These results were expected, 
as the soft-thresolded estimator shrinks the least squares coefficients, and would need 
to be followed by a re-fitting step, which essentially amounts to the hard thresholding 
procedure we analyze below. 

Tables [T] and [2] contain the estimated mean squared errors for all the competing esti- 
mates, together with the hard-thresholded estimates; we also included the least squares 
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Table 1. EMSE results for Signal 1 for BB and AR(1) 



VEMSE X 10"'^ 


Brownian Bridge 




AR(1) 


[VMEDMSE X 10-8] 


0-* = 1 




= 10 


a* 


= 1 


a* = 10 


Fourier Basis 














OLS 


29582 [27580] 


21104 


[18217] 


30767 


[30398] 


22767 [22415] 


HT(r) 


18429 [15723] 


17544 


[14446] 


16928 


[16072] 


15989 [15195] 


HT(2r) 


20231 [17721] 


18334 


[15441] 


23894 


[23767] 


21448 [19651] 


Haar Basis 














OLS 


29493 [27537] 


21092 


[18208] 


30672 


[30302] 


22749 [22389] 


HT(r) 


34820 [33551] 


23808 


[21495] 


38204 


[38064] 


27852 [27449] 


HT(2r) 


48011 [47158] 


48879 


[53945] 


61629 


[61684] 


52427 [52232] 


Pooled Curves 














Local Poly 


22769 [20274] 


20557 


[17584] 


22673 


[22075] 


20479 [20073] 


NWK 


23186 [20790] 


20271 


[17289] 


22848 


[22279] 


20594 [20153] 


Smoothing Splines 


21455 [18801] 


20186 


[17308] 


20794 


[20263] 


19408 [19069] 


Ensemble Ave 


29493 [27537] 


21092 


[18208] 


30672 


[30302] 


22749 [22389] 


Curve - by - Curve 














Global Kernel 


47709 [46877] 


23165 


[20647] 


31641 


[31123] 


20451 [20095] 


Local Kernel 


36161 [34809] 


21677 


[19036] 


27824 


[27261] 


20548 [20152] 


B-splines Regression 


38532 [37384] 


20181 


[17291] 


21413 


[20899] 


20897 [20531] 


Fourier Regression 


39543 [38316] 


31905 


[30200] 


38309 


[38091] 


30359 [30167] 



Table 2. EMSE results for Signal 1 for BM and AR1MA(1,1) 



VEMSE X 10-** 


Brownian Motion 


ARIMA(1,1) 


[ VMEDMSE X 10-«] 


a* = 1 


a* 


= 10 


<T* = 1 


a* = 10 


Fourier Basis 












OLS 


50959 [44662] 


37958 


[29482] 


53549 [52989] 


41562 [40893] 


HT(r) 


35388 [26045] 


34133 


[24780] 


31356 [30346] 


30491 [29488] 


HT(2r) 


37794 [29300] 


35322 


[24835] 


42625 [41283] 


44088 [39450] 


Haar Basis 












OLS 


50815 [44564] 


37938 


[29419] 


53399 [52749] 


41543 [40867] 


HT(r) 


58862 [54324] 


45587 


[34722] 


66936 [65883] 


51689 [51020] 


HT(2r) 


78181 [75281] 


88327 


[85689] 


106709 [106957] 


103036 [96124] 


Pooled Curves 












Local Poly 


40648 [32707] 


37345 


[28707] 


41186 [40240] 


38084 [37464] 


NWK 


41260 [33561] 


36791 


[27848] 


41277 [40470] 


38092 [37468] 


Smoothing Splines 


38737 [30617] 


36865 


[27928] 


38281 [37269] 


36341 [35685] 


Ensemble Ave 


50815 [44564] 


37938 


[29419] 


53399 [52749] 


41543 [40867] 


Curve - by - Curve 












Global Kernel 


83134 [80354] 


43568 


[36505] 


48418 [47520] 


38552 [38017] 


Local Kernel 


62759 [58781] 


39777 


[31632] 


45305 [44259] 


38780 [38184] 


B-splines Regression 


71699 [68131] 


38657 


[30346] 


39255 [38373] 


39874 [39140] 


Fourier Regression 


66629 [62591] 


54688 


[49219] 


64238 [64015] 


51861 [51601] 



estimator as a basis of comparison. For brevity we only included the results relative 
to f{t) equal to Signal 1; we have obtained very similar results for Signal 2, and do 
not report them here for space considerations. The SNR for these simulations was 
set to 4.25, and we lower it substantially in the next section. Our results support the 
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following. 



Conclusions on the performance of the fit of the estimators. 

1. Our estimates, as expected, are sensitive to the basis choice. Since in this case 
the signal is differentiable, the best approximation and estimation is obtained via the 
Fourier basis. This supports, once more, the need for complementing any estimation 
procedure based on basis approximations with a basis selection step. 

2. If a* = 10, that is when the variance of the process dominates the variance of the 
random errors, the threshold estimators based on the Fourier basis perform essentially 
the same as most of the competing estimators, though they are consistently slightly 
better. It should be noted though that the performance of the competing estimators 
depends crucially on the selection of the tuning parameters of the respective method; 
therefore improvements of these estimates may be possible, via a refined choice of the 
respective tuning parameters. This may become very involved computationally and 
difficult to analyze theoretically. In contrast, our computationally simple estimator 
is fully data-driven: the threshold levels are estimated from the data and our tuning 
parameter, the basis, can also be selected via a simple cross-validation method with 
proven optimality properties. 

3. If cr* = 1 the difference between our estimator and the competing ones is more 
pronounced, especially for the BM and ARIMA(1,1) processes, suggesting that this 
type of estimation is more robust against the variability in the data. As an additional 
remark, our experiments indicate that some of the estimators proposed in the literature 
may be outperformed by the simple least squares estimator based on all the data points, 
or even by the naive sample average, if the choice of their tuning parameters is not 
refined; for all our simulations we did choose these tuning parameters adaptively as 
explained above, but we did not attempt to improve upon the published guidelines on 
their selection. 

3.3. Simulation results: confidence bands. The literature on confidence bands 
for the mean in model ([T]) is limited. One stategy is to propose an estimator of /(t), 
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establish its asymptotic distribution, and use this to construct confidence intervals. 
Results pertaining to the asymptotic distribution of estimators of f{t) is model ([T]) 
are also limited; one exception is Zhang and Chen (2007), who studied the asymptotic 
distribution of the mean estimator fAK{t) obtained by averaging kernel smoothed in- 
dividual trajectories. 

The coverage of the bands obtained via such a strategy will be affected by the accuracy 
of the estimators of the asymptotic variance. To investigate this effect we consider three 
bands of the form: 

Band 1: V(t) = r(t,t), the theoretical variance function of the process Z{t). 
Band 2: V{t) = Vi(t), where Vi{t) is estimated using functional principal components 
analysis. That is, Vi{t) = X]f=i ■^r^rit)^ where the estimated eigenvalues = Var(^r) 
and the estimated eigenfunctions C,r are computed as in Ramsay and Silverman (2005), 
section 8.4.2, and using the f da package in R. For our simulations we took R = 10. 
Band 3: V(t) = ^t) = {l/(n - 1)} EtiiMt) ' Kt)?- 

We compare these bands with the proposed bands constructed as in Theorem|5| (1) and 
(2), with Fourier basis functions 0^ ,1 < A; < m. We also investigate bands based on the 
untruncated least squares estimate: f{tj)^Yll^=i^k\4>k{tj)\, 1 < j < The signal-to- 
noise ratio was set to 1.5 and 2.2, respectively. Tables |3] and |4] summarize the results for 
the AR(1) and BB processes and Signal 1. We obtained results similar in spirit for all 
our other cases, and do not include them here for brevity. The entries in these tables 
are the widths of the confidence bands followed, in parentheses, by their empirical 
coverage. Recall that we are interested in simultaneous (over all tj's) coverage, and so 
the empirical coverage is given by the relative frequency over simulations of 

m 

(17) \[l{\f{t,)-g{t,)\<B,} 

for the various estimators 'g with their corresponding width Bj at tj. For instance, 
'g = /(f) and Bj = ^i.rk\4>k{tj)\{\Jj-k\ > ^k} correspond to our first proposed band in 
Tables [3] and |4} For our simulations we took a = 0.05, and we therefore expect our 
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Table 3. Width (Coverage) for confidence bands. Scenario: Signal 1, 
AR(1), m = 2^ = 64, t G [0,1] 



500 sims signal-to-noise = 1.5 signal-to-noise = 2.2 

n = 75 n = 100 n = 40 n = 50 
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Bands based on asymp. normality 

Band 1 0.34 (0.97) 0.29 (0.97) 0.46 (0.98) 0.41 (0.97) 

Band 2 0.31 (0.87) 0.26 (0.87) 0.42 (0.86) 0.37 (0.86) 

Band 3 0.32 (0.94) 0.27 (0.94) 0.43 (0.94) 0.39 (0.94) 

Proposed Bands 

f{r){^j)±T,T=ifk\<f>k(tj)\h\M>f^} 0.44 (0.96) 0.40 (1.00) 0.65 (0.97) 0.61 (0.99) 

/(r)fe)±3Er=i^fcl</'*fe)|I{|Afcl>rfc} 1-31 (1.00) 1.20 (1.00) 1.94 (1.00) 1.84 (1.00) 

f{r){tj)±T.T=ifk\<l>k{tj)\ 1.89 (1.00) 1.64 (1.00) 2.58 (1.00) 2.32 (1.00) 



Bands based on asymp. normality 

Band 1 0.34 (0.25) 0.29 (0.08) 0.46 (0.35) 0.41 (0.21) 

Band 2 0.33 (0.22) 0.29 (0.08) 0.45 (0.30) 0.41 (0.22) 

Band 3 0.33 (0.25) 0.28 (0.08) 0.45 (0.36) 0.41 (0.26) 

Proposed Bands 

± Er=i ^'fcl<^fe(*j)|I{|Mfel>rfe} 0.56 (0.99) 0.52 (1.00) 0.84 (0.99) 0.78 (1.00) 

h){tj)±^Y.k^irk\<l>k{tj)\l(\M>f>,} 1-68 (1.00) 1.55 (1.00) 2.53 (1.00) 2.35 (1.00) 

/(F){*i)±Er=i^^fcl</'fc{*j)l 3.15 (1.00) 2.73 (1.00) 4.29 (1.00) 3.85 (1.00) 



bands to have at least 95% coverage. The results presented in Tables [3] and [4] below 
support the following. 

Conclusion on the proposed confidence hands 

1. The coverage of the competing bands, based on the asymptotic normality of the 
estimators, depends crucially on the ratio of cr^, the variance of the random errors e, to 



the variance of the process, as measured by a* defined in (16) above. This is expected, 
as the variance of the limiting distribution of these estimates depends only on r(t, t), 
and is independent of cr^, which vanishes asymptotically in the limit. Therefore, bands 
that only take T{t,t) into account cannot adapt to relatively large contributions of e 
to the overall variability, as quantified, in our simulations, by a* = 1. 
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Table 4. Width (Coverage) for confidence bands. Scenario: Signal 1, 
BB, m = 2^ = 64, t e (0, 1) 



S = 500 sims signal-to-noise = 1.5 signal-to- noise = 2.2 

n = 125 n = 150 n = 75 n = 100 

10 

Competing Bands based on asymp. normality 

Band 1 0.23 (0.95) 0.21 (0.92) 0.30 (0.93) 0.26 (0.90) 

Band 2 0.23 (0.91) 0.21 (0.87) 0.29 (0.88) 0.25 (0.83) 

Band 3 0.23 (0.93) 0.21 (0.90) 0.29 (0.90) 0.26 (0.86) 

Proposed Bands 

/(F) ± T,T=i ^k\Mtj)\h\M>f^} 0.33 (0.95) 0.31 (0.98) 0.50 (0.94) 0.46 (0.99) 

fjP){tj)±3T,T^irk\<t>k{tiW{\M>r^} 0.99 (0.99) 0.94 (0.99) 1.51 (0.99) 1.37 (1.00) 

f(M^j)±T,T^irk\Mtj)\ 1.14 (1.00) 1.04 (1.00) 1.46 (1.00) 1.27 (1.00) 



Competing Bands based on asymp. normality 

Band 1 0.23 (0.00) 0.21 (0.00) 0.30 (0.00) 0.26 (0.00) 

Band 2 0.25 (0.00) 0.23 (0.00) 0.32 (0.00) 0.28 (0.00) 

Band 3 0.24 (0.00) 0.22 (0.00) 0.31 (0.01) 0.27 (0.00) 

Proposed Bands 

f\r){h) ± Er=i ^k\<f>u{h)\l(^i,^^yf^} 0.47 (0.99) 0.44 (1.00) 0.63 (0.99) 0.55 (1.00) 

I(f){tj)±3Y.k^irk\<f>k{tj)\l{\M>f^} 1.41 (1.00) 1.32 (1.00) 1.88 (1.00) 1.66 (1.00) 

hr){tj)±Y.k=irk\<t>k{tj)\ 2.22 (1.00) 2.02 (1.00) 2.86 (1.00) 2.48 (1.00) 



Both Tables |3] and |4] show this phenomenon: the bands based on asymptotic normality 
always become narrower as n increases. However, since their width does not take into 
account a^, these bands loose completely (bottom of Tables [s] and |4| the excellent 
behavior they exhibit when a* = 10 (top of Tables [s] and |4]) . Figure [s] (a) shows that 
for cr* = 1 the peak of / is never covered by the band, which is the reason for the 



extremely poor uniform coverage; we recall that the coverage is computed via (17). In 
contrast, the first of our proposed bands is extremely robust to these variations, at the 
price of being slightly wider, as presented in Figure IS] (b) - (c). 



2. Tables |3] and |4] support the expected fact that as the signal-to-noise ratio increases, 
our bands have good coverage for smaller sample sizes, as small as = 40; since our 
bands are conservative we occasionally have 100% coverage. The influence of a* is, 
however, prevalent: even if the signal is stronger, the competing Bands 1-3 still can- 
not offer the required coverage 95 % for cr* = 1, not even for the ideal band Band 1, 
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that uses the theoretical variance of the processes. The differences between Band 2 
and Band 3 illustrate the effect of estimating this variance on the coverage of the band. 
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3. The first of our proposed bands is uniform over the space of parameters that are 
larger than the noise level quantified hj r^, 1 < k < m. It is the least conservative of 
our bands, and is the one we suggest for use in practice. The trade-off between its width 
and its robustness against the variability in the data make it a strong competitor to 
the narrower, but less robust Bands 1-3. The second of our bands is uniform over the 
whole parameter space and is therefore conservative and necessarily wide; nevertheless 
it is always smaller, sometimes by almost a factor of two than the extreme case of a 
naive uniform band based only on the untruncated least squares estimator. 



Appendix A. Proofs 

Recall that we have introduced the following truncation levels which we repeat here 
for ease of reference. 



'n \ m \/n 



n V m y/n 

Recall that the quantities /i^ and /i^ are given respectively by ([T]) and ([3]). 

Lemma 7. Set 

m 
k=l 

Then 



limliminf inf P(f2m,n,(5) > 1 — « 

ej.0 n^oo fii,...,fj,m 



for all m. 



Remark. Lemma [7] above is central to our analysis. The statements of Theorems [T] - [2] 
and |5] - 16] hold on the random set flm,n,5- Lemma [7] shows that the probability of this 
set is larger than 1 — a, asymptotically in n and uniformly over {fii, . . . , fim), for any 
m > 1. 
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Proof of Lemma [rj Set A^. = {l/n) Yl^=i ^-^id Ek = (1/n) Yl^=i based on Aik 
defined above in Q and 

^ m 



Then we can write for eacli fixed m, 



1 " 

Sk = T 2.{V'i,k - /ifc) 

n — 1 ^ — ^ 

i=l 
1 " 

— {{Aik - Ak) + {Eik - Ek)y 



n — _ 

1=1 

and consequently, Si — ^ al + cr'^/m almost surely, as n — > oo. By the Bonferroni bound, 
the Central Limit Theorem, Kolmogorov's Strong Law of Large Numbers and Slutsky's 
Lemma, we have 

limliminf inf F{Qm,n,5) > 1 — « 

ej,0 n^oo ^ti,...,^i„ 

for all m, which is the desired result. ■ 



A.l. Proof of Theorem [H We first observe that 



\\ f2r - fr\\m,oo = HiaX 
l<j;<m 



^(/ifc(2rfc) - ^ik{rk))<Pk{tj) 

k=l 

m 

< max ||(/)fc||oo |/ife(2rfc) - /ifc(rfc)| 



l<k<m 

k=l 

and that 

m 

Whr - /r||^,2 = ^{Jiki'^rk) - ^lk{rk)f■ 

k=l 

It remains to show that on the event VLm^n,5-, 
(18) |/ifc(2ffc) - /ifc(rfc)| < 3ffcl{|/ifc| > Tk} 

holds for all 1 < A; < m, and any m > 1. Indeed, on f2m,n,£, 
|/ifc(2ffc) - yUfc(rfc)| 

= |/ifc{|/ifc| > 2ffe} - /ifc{|/ifc| > rk}\ 

< l/ifc - /WfcKI/Wfel > rk} + |/ifc| |{|/ifc| > 2ffc} - {liUfel > rk}\ 

< rkilfJ'kl > rk} + lAffcl KlAtfcl > 2ffc} - > rfe}| 
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For the second term, we consider two cases: \fik\ < and > and we get 

= > 2ffc}{|/ifc| < Tk] + |/ifc|{|/ifc| < 2ffc}{|/ife| > Tfc} 

< 2ffc{|/ife| > Tk) 

< 2fk{\^^k\ > rk} 

where we used in the penuhimate inequahty the fact that \fik\ < ^k implies that 



l/Wfcl < 2ffc. Combining the two preceding displays yields (18). 



A. 2. Proof of Theorem [2| It suffices to show that on the event ^lm,n,s, 
(19) |/ifc(2ffc) - /ifc(rfe)| < 3fkl{\Hk\ > rk} 

holds for all 1 < < m. The remainder of the proof is identical to that of Theorem [1} 



To show (19), we begin by observing that 

|sgn(/ifc)(|/ifc| - 2ffc)+ - /ifc{|/ife| > rk}\ 
< |sgn(/ifc) - sgn(/ifc)||/ifc|{|/ifc| > Tk} + |(|/ifc| - 2ffc)+ - \fik\{\lJ'k\ > rk}\ 
The ffist term on the right is zero: 

|sgn(/ifc) - sgn(/ifc)||/ifc|{|/ife| > rk} 

= {/ifc > rfc}|sgn(/ifc) - + {/ifc < -rk}\sgn(j2k) + l\\fik\ 

= 

since fik > rk (and — /ifcl < r^) implies that /ifc > and sgn(/iA:) = 1, and, in a 
similar way, /i^ < — implies that /ife < and sgn(/ifc) = —1. Consequently, 

|sgn(/ifc)(|/ife| - 2ffc)+ - /ifc{|/ifc| > rk}\ < |(|/ifc| - 2fA,)+ - > rfc}] 

Next, since \fik\ ^ rk implies that l/ifc] < 2rk < 2ffc < 2fk, we obtain 

|(|/ifc| - 2ffc)+ - \^Ik\{\^^k\ > rk}\ 

< > rk, < 2ffc} + pfel - |/ifc| - 2ffc| > rfc, \Jik\ > 2ffc} 

< 3ffc{|/ifc| > Tk}. 



Combination of all these bounds gives ( 19 ) 
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A. 3. Proof of Theorem [5| We first notice that on the event ^lrn,n,5, we have 
(20) |/2fc(ffc) - /ife(2ffe)| < 3rfcl{|/ifc| > ffc} 

for all 1 < A; < m. This follows essentially from interchanging the roles of /ifc and fik 



and Tfc and in the proof of ( 18 ) 



|/ifc(2ffc) - Jikih)] 

< l/ife - fJ'kliWl > h} + |/ife| > 2ffc} - > rk}\ 

< rkil'p^kl > h} + Wl \{\fJ'k\ > 2ffc} - > ffc}| 
'"fell/ifcl > h} + lyUfel |{|/ifc| > 2ffc} - > ffc}| 

= ^^fcll/Ufcl > ^fc} + > 2ffc}{|/ifc| < ffe} + |/ifc|{|/ifc| < 2ffc}{|/ife| > ffe} 

< ^^fcll/ifcl > h} + 2ffc{|/ifc| > ffc} 

< 3rfe{|/ifc| > ffc} 

Since 

{m 
\fir){tj) - /(2f)(t,)| < 3^Ffc|0fe(t,)l{|/2fc| > n}, 1 < J < m 
fc=i 

> P{|/2fc(ffc) -/ifc(2f,.)| < 3fa{|/Ufc| > f4, 1 < A; < m} 

> P(nm„n,<5)- 

The second claim follows from 

|/ifc(ns) - /Ufe(2ffc)| < rfcl{|/iA:| > ^fc} + |/ifc|l{|/UA:| < 2ffe}l{|/ifc| > ffc} 
= rfcl{|/iA:| > ffc} 

if we have that all /Xfc = if \fik\ < 2ffc. ■ 
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A. 4. Proof of Theorem [6[ The proof is similar to the one of Theorem [5} except that 
we replace (20) by the following inequality, which holds on the event flm,n,5- 



sign(/ifc) - ffc)^ - sign(/ifc)|/ifc|l{|AiA:| > 2ffe}| 

< |sign(/ifc) - sign(/ifc)| |/ifc|l{|/ifc| > 2fk} + - ffc)^ - > 2ffc} 
= |(|/ifc| - rk)+ - |Aifc|l{|/ifc| > 2ffc}| 

< Id/Ufcl - rk)+ - W\ \ > 2ffc} + - ffe)_^ < 2ffc} 
= - - H\l^k\ > 2ffe}l{|/ifc| > ffc} 

+ (l/ifcl - < 3ffc}l{|/ifc| < 2ffc} 

< 2fa{|/ifc| > 2ffc}l{|/ifc| > ffc} + 2fa{rfc < < 3Ffc}l{|/ifc| < 2ffc} 

< 2rfc{|/ifc| > ffc}. ■ 



A. 5. Proof of Theorem [3l First observe that 

m 

nfi2r)-fir)\\l,2 = J] E|/2fc (2r,) - /i, (r,) | ^ 
k=l 

m 

= [(/ifc - > rfc} + /ifc(/{|/ife| > 2rfc} > r^})]^ 

k=l 

m 2 2 

<2V(— + >rfc} + V2E/I2/{|^,| >2rfe, < rj 

^-^ n mn ^-^ 

k=l k=l 
m 

+ ^2E/2^/{|/2fe| < 2rfc, \fik\>rk} 
k=i 

m 2 2 

<2Y,C- + ^)I{W\>r,} 
^-^ n mn 

k=l 
m 

+ ^4E [ill + {Jlk - HkY] > 2rfc, \nk\ < rk} 

k=l 
m 

+ ^2(2rfc)2/{|/I;,| < 2rfc, > r^}. 

A;=l 
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Using the triangle inequality readily yields the first claim. For the second claim, it is 
assumed that (/i^ — ^k)/ a/o"^ + (j| /m is A^(0, 1). Hence 

\ n nmj J\x\>z(a/(2m)) 

dx 

(a/(2m)) 



SO that 





+ 




j 


n 


nm J 


J\x 


-A 


+ 


^) 


a 


n 


nm j 


m 



k=l 



as claimed. 



A. 6. Proof of Theorem |4| For any real function that is independent of 1^^, i & I2 
we have 



S2{g) = E \S,{9)] = 11/ - ?IIL2 + - 5^r(T„T,) + a 



so that 



^2(^7)-^2(/) = |k-/|| 



m,2- 



Repeating the same arguments as in Lemma 2.3 of Wegkamp (2003), we obtain 



\9-f\\l,2 < + 



m,2 



+ max 



1 1 

2(1 + «)- E - E^^^^- + Z^{T,)}{ge{T,) - f{T,)} - a\\ge - f\\ 



no ^ — ' m 

i&h j=i 



2 

m,2 



for all a > and all k > 1. Furthermore, using the inequality 2xy < x'^/c + cy"^ for 
c = (1 + a)/a, ?/ = \\gi - f\\m,2, 



X 



n2'^ m . 
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we obtain 



II?-/IIL2 < (l + a)||^,-/|| 



m,2 



t a 



1 1 



n2 '-^ m ^ 



{?,(T,)-/(T,)} 

-/||m,2 



< {l + a)\\g,-f\\l, 
+ max2-^ — 



+ max2-^ — 

k a 



1 1 

-E-E 

ieh 3=1 



Wde - f\\m,2 



By Rosenthal's (see, e.g., Wegkamp 2003, page 262), the Cauchy-Schwarz and Jensen's 
inequahties, we obtain 



E 



no ^ — ^ m 

i&h i=i 



Wde - f\\m,2 



< CpUg max 



EE 



< Cpn2 ^ max I E 



-E' 

3=1 

m 

m ^-^ 

3=1 



{UT,)-f{T,)} 



■ij 



Wde - f\\m,2 



p/2 



m 



p 






E^ 








P/2\ 











m 

-E' 

3=1 



Wde - /IU,2 



IP 



< CpTi^^ max I — ) 

ieh j=i 



m 

E^^ E^«i 

.ieh j=i 



p/2^ 



In the same way. 



E 



i&h 3=1 



Cpn2 ^ max(n2Tz,p, {n2Tz,2Y^^)- 
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Using the same arguments as in the proofs of Lemma 2.5 and Theorem 2.2 in Wegkamp 
(2003), we arrive at 



(l + a)2 

max 

k a 



no — ' m 



< 



1 + a 



n 



+ (1 + a)K^ (f,,, + -al + fz,, + f'J^) ( 



1 + a 



m,2 



p/2 



and take a = 1 to obtain the result. 



□ 
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